Rule-Based Spanish Morphological Analyzer Built From Spell Checking Lexicon
نویسنده
چکیده
Preprocessing tools for automated text analysis have become more widely available in major languages, but non-English tools are often still limited in their functionality. When working with Spanishlanguage text, researchers can easily find tools for tokenization and stemming, but may not have the means to extract more complex word features like verb tense or mood. Yet Spanish is a morphologically rich language in which such features are often identifiable from word form. Conjugation rules are consistent, but many special verbs and nouns take on different rules. While building a complete dictionary of known words and their morphological rules would be labor intensive, resources to do so already exist, in spell checkers designed to generate valid forms of known words. This paper introduces a set of tools for Spanish-language morphological analysis, built using the COES spell checking tools, to label person, mood, tense, gender and number, derive a word’s root noun or verb infinitive, and convert verbs to their nominal form.
منابع مشابه
A compiler for phonological rules
This is a report of an implementation of a compiler for phonological rules. The implementation processes files written in lexc (Karttunen, 1993) annotation and produces data suitable for processing with SWI-Prolog (Wielemaker, 2005). The output is a transducer (a Mealy machine, to be more precise), a finite-state machine which not only accepts but also translates its input. Such machines can be...
متن کاملXUXEN: A Spelling Checker/Corrector for Basque Based on Two-Level Morphology
The application of the formalism of two-level morphology to Basque and its use in the elaboration of the XUXEN spell ing checker/corrector are described. This application is intended to cover a large part of the language. Because Basque is a highly inflected language, the approach of spelling checking and correction has been conceived as a by-product of a general purpose morphological analyzer/...
متن کاملپارس مورف: تحلیلگر صرفی زبان فارسی
In this paper, the theoretical foundation, the way of implementation and the uses of Pars Morph, a Persian morphological analyzer is introduced. Pars Morph is a rule-based Persian morphological analysis system, which analyzes the internal structure of word in Persian and determines the grammatical category and function of the word parts. Pars Morph being in link with a lexicon covering about 45...
متن کاملSWATAC: A Sentiment Analyzer using One-Vs-Rest Logistic Regression
This paper describes SWATAC, a system built for SemEval-2015’s Task 10 Subtask B, namely the Message Polarity Classification Task. Given a tweet, the system classifies the sentiment as either positive, negative, or neutral. Several preprocessing tasks such as negation detection, spell checking, and tokenization are performed to enhance lexical information. The features are then augmented with e...
متن کاملBuilding ancient Spanish dictionaries for spell-checking of DL texts
Being aware of the usefulness of spell-checkers on the correction of modern works, and lacking this facility for ancient texts, we decided to build dictionaries for ancient Spanish. This decision led to new problems and new questions. We have built a time-aware system of dictionaries that takes into account the temporal dynamics of language, to help solve the problem of ancient Spanish spell-ch...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید
ثبت ناماگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید
ورودعنوان ژورنال:
- CoRR
دوره abs/1707.07331 شماره
صفحات -
تاریخ انتشار 2017